Using Sacks to Organize Registers in VLIW Machines
نویسندگان
چکیده
This paper analyses the register requirements of software pipelined inner loops. When the number of functional units and/or the number of stages of individual functional units is increased, the number of registers required may be prohibitive in chip area and cycle time. We characterize lifetime of values in pipelined loops with their loop register locality (LRL). Based on this characteristic, we propose a new organization of the register le in order not to aaect cycle time and also reduce area, while increasing the number of registers. This can be useful to minimize the frequency of spill at a reasonable cost. The spill code can increase the minimum initiation interval and decrease loop performance. This new organization consists of a small high bandwidth multiported register le and a low bandwidth port-limited register le called sack. A mechanism to assign values to the sack is presented. We demonstrate the eeectiveness of our approach by experimenting with a collection of loops from the Perfect Club benchmark suite. Experiments in order to nd the optimal number of registers into the sack have been done. We also measured the eeect of the spill code on loop performance.
منابع مشابه
Improving DTSVLIW Performance via Block Compaction
Dynamically Trace Scheduled VLIW (DTSVLIW) machines have two execution engines and two instruction caches: a Scheduler Engine and a VLIW Engine, and an Instruction Cache and a VLIW Cache. The Scheduler Engine fetches instructions from the Instruction Cache and executes them singly, the first time, using a simple pipelined processor. In addition, it dynamically schedules the instruction trace ...
متن کاملMethod and apparatus for the selective scoreboarding of computation results
Statically scheduled machines do have a disadvantage when dealing with dynamic events, such as cache hit or miss detection. Early VLIW machines were designed without caches, to achieve predictability in memory access. However, such designs suffer in memory performance. To achieve high performance, VLIW architectures must have adequate support for using caches. A simple VLIW design might use an ...
متن کاملMachine-Description Driven Compilers for EPIC and VLIW Processors
In the past, due to the restricted gate count available on an inexpensive chip, embedded DSPs have had limited parallelism, few registers and irregular, incomplete interconnectivity. More recently, with increasing levels of integration, embedded VLIW processors have started to appear. Such processors typically have higher levels of instruction-level parallelism, more registers, and a relatively...
متن کاملAn Operation Rearrangement Technique for Low-Power VLIW Instruction Fetch
As mobile applications are required to handle more computing-intensive tasks, many mobile devices are designed using VLIW processors for high performance. In VLIW machines where a single instruction contains multiple operations, the power consumption during instruction fetches varies signi cantly depending on how the operations are arranged within the instruction. In this paper, we describe a p...
متن کاملDTSVLIW: VLIW Performance with Sequential Code
Due to the temporal execution locality present in programs, even small instruction caches (16-Kbyte) can provide processors with fast access to instructions most of the time. The Dynamically Trace Scheduled VLIW (DTSVLIW) architecture exploits programs’ temporal execution locality by executing code in two distinct modes. In the first execution encounter, fragments of the code are executed in ...
متن کامل